Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Sequencing and Raw Sequence Data Quality Control ◾ 21

On Linux terminal, we can use FastQC non-interactively and later we will display the

generated reports on an Internet browser. But before running FastQC, it is important to

know about the “limits.txt” file in the “Configuration” directory. This file contains the

default values for the FastQC options, and we can use it to determine which report to gen-

erate. Use a text editor of your choice to open that file and study its content. In most cases,

no change is needed. At this point, we will change only

kmer ignore 1 to kmer ignore 0

Then, save the file and exit. This change is necessary to include k-mer report when we run

the program.

The following is a simple syntax for running the FastQC program non-interactively on

the command line:

fastqc seqfile1 seqfile2 .. seqfileN

The input can be a single FASTQ file name or multiple file names separated by whitespaces.

The FastQC program has several options that can be displayed using the following

command:

fastqc --help

Since we have downloaded the eight E. coli raw FASTQ files above and stored them in the

“fastQC” directory, we can either run the program for each file or provide all file names

as input as shown in the above syntax. However, the efficient way is to use the bash com-

mands if we are using a Linux/Unix platform. The following bash script creates a directory

“qc”, changes to “fastQC” directory where the FASTQ files are stored, stores the file names

in a variable “filename”, then runs the FastQC program non-interactively, and finally saves

the QC reports in the “qc” directory:

mkdir qc

cd fastQC

filenames=$(ls *.fastq)

fastqc $filenames \

--outdir ../qc \

--threads 3

cd ..

We can also simply use the following command:

mkdir qc

cd fastQC

fastqc *.fastq --outdir ../qc --threads 3

The QC reports of the FASTQ files will be stored in the “qc” directory. FastQC will gen-

erate an HTML file “*_fastqc.html” and a zipped file “*_fastqc.zip” for each FASTQ file.